NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning 3D Perception from Others' Predictions

Yoo, Jinsu; Feng, Zhenyang; Pan, Tai-Yu; Sun, Yihong; Phoo, Cheng Perng; Chen, Xiangyu; Campbell, Mark; Weinberger, Kilian Q; Hariharan, Bharath; Chao, Wei-Lun (April 2025, International Conference on Learning Representations)

Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.
more » « less
Free, publicly-accessible full text available April 28, 2026
DiffuBox: Refining 3D Object Detection with Point Diffusion

Chen, Xiangyu; Liu, Zhenzhen; Luo, Katie Z; Datta, Siddhartha; Polavaram, Adhitya; Wang, Yan; You, Yurong; Li, Boyi; Pavone, Marco; Chao, Wei-Lun; et al (December 2024, Advances in Neural Information Processing Systems 37 (NeurIPS 2024))

Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-based box refinement approach. This method employs a domain-agnostic diffusion model, conditioned on the LiDAR points surrounding a coarse bounding box, to simultaneously refine the box’s location, size, and orientation. We evaluate this approach under various domain adaptation settings, and our results reveal significant improvements across different datasets, object classes and detectors. Our PyTorch implementation is available at https://github.com/cxy1997/DiffuBox.
more » « less
Full Text Available
Aphid cluster recognition and detection in the wild using deep learning models

https://doi.org/10.1038/s41598-023-38633-5

Zhang, Tianxiao; Li, Kaidong; Chen, Xiangyu; Zhong, Cuncong; Luo, Bo; Grijalva, Ivan; McCornack, Brian; Flippo, Daniel; Sharda, Ajay; Wang, Guanghui (December 2023, Scientific Reports)

Abstract Aphid infestation poses a significant threat to crop production, rural communities, and global food security. While chemical pest control is crucial for maximizing yields, applying chemicals across entire fields is both environmentally unsustainable and costly. Hence, precise localization and management of aphids are essential for targeted pesticide application. The paper primarily focuses on using deep learning models for detecting aphid clusters. We propose a novel approach for estimating infection levels by detecting aphid clusters. To facilitate this research, we have captured a large-scale dataset from sorghum fields, manually selected 5447 images containing aphids, and annotated each individual aphid cluster within these images. To facilitate the use of machine learning models, we further process the images by cropping them into patches, resulting in a labeled dataset comprising 151,380 image patches. Then, we implemented and compared the performance of four state-of-the-art object detection models (VFNet, GFLV2, PAA, and ATSS) on the aphid dataset. Extensive experimental results show that all models yield stable similar performance in terms of average precision and recall. We then propose to merge close neighboring clusters and remove tiny clusters caused by cropping, and the performance is further boosted by around 17%. The study demonstrates the feasibility of automatically detecting and managing insects using machine learning models. The labeled dataset will be made openly available to the research community.
more » « less
Full Text Available
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery

Luo, Katie; Liu, Zhenzhen; Chen, Xiangyu; You, Yurong; Bainam, Sagie; Phoo, Cheng P; Campbell, Mark; Sun, Wen; Hariharan, Bharath; Weinberger, Kilian Q (December 2023, Advances in Neural Information Processing Systems 36 (NeurIPS 2023))

Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles—where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, i.e., boxes containing objects are scored higher than those without. We start from the detector’s own predictions to explore the space and reinforce boxes with high rewards through gradient updates. Empirically, we demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train compared to prior works on object discovery. Code is available at https://github.com/katieluo88/DRIFT.
more » « less
Full Text Available
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery

Luo, Katie Z; Liu, Zhenzhen; Chen, Xiangyu; You, Yurong; Benaim, Sagie; Phoo, Cheng Perng; Campbell, Mark Campbell; Sun, Wen; Hariharan, Bharath; Weinberger, Kilian Q (December 2023, Conference on Neural Information Processing Systems)

Full Text Available
HINDSIGHT IS 20/20: LEVERAGING PAST TRAVERSALS TO AID 3D PERCEPTION

You, Yurong; Luo, Katie Z; Chen, Xiangyu; Chen, Junan; Chao, Wei-Lun; Sun, Wen; Hariharan, Bharath (April 2022, ICLR 2022)

Full Text Available
Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception

Yurong, You; Luo, Katie Z.; Chen, Xiangyu; Chen, Junan; Chao, Wei-Lun; Sun, Wen; Hariharan, Bharath; Campbell, Mark; Weinberger, Kilian Q. (June 2022, International Conference on Learning Representations (ICLR))

Full Text Available
Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception

You, Yurong; Luo, Katie; Chen, Xiangyu; Chen, Junan; Chao, Wei-Lun; Sun, Wen; Hariharan, Bharath; Campbell, Mark; Weinberger, Kilian Q (April 2022, International Conference on Learning Representations)

Full Text Available
Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception

You, Yurong; Luo, Katie Z; Chen Xiangyu; Chen, Junan; Chao, Wei-Lun; Sun, Wen; Hariharan, Bharath; Campbell, Mark; Weinberger, Kilian Q. (April 2022, International Conference on Learning Representations)

Self-driving cars must detect vehicles, pedestrians, and other trafﬁc participants accurately to operate safely. Small, far-away, or highly occluded objects are particularly challenging because there is limited information in the LiDAR point clouds for detecting them. To address this challenge, we leverage valuable information from the past: in particular, data collected in past traversals of the same scene. We posit that these past data, which are typically discarded, provide rich contextual information for disambiguating the above-mentioned challenging cases. To this end, we propose a novel end-to-end trainable Hindsight framework to extract this contextual information from past traversals and store it in an easy-to-query data structure, which can then be leveraged to aid future 3D object detection of the same scene. We show that this framework is compatible with most modern 3D detection architectures and can substantially improve their average precision on multiple autonomous driving datasets, most notably by more than 300% on the challenging cases. Our code is available at https://github.com/YurongYou/Hindsight.
more » « less
Full Text Available
Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions

Diaz, Carlos A; Xia, Youya; You, Yurong; Nino, Jose; Chen, Junan; Monica, Josephine; Chen, Xiangyu; Luo, Katie Z; Wang, Yan; Emond, Marc; et al (July 2022, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition)

Advances in perception for self-driving cars have accel- erated in recent years due to the availability of large-scale datasets, typically collected at specific locations and under nice weather conditions. Yet, to achieve the high safety re- quirement, these perceptual systems must operate robustly under a wide variety of weather conditions including snow and rain. In this paper, we present a new dataset to enable robust autonomous driving via a novel data collection pro- cess — data is repeatedly recorded along a 15 km route un- der diverse scene (urban, highway, rural, campus), weather (snow, rain, sun), time (day/night), and traffic conditions (pedestrians, cyclists and cars). The dataset includes im- ages and point clouds from cameras and LiDAR sensors, along with high-precision GPS/INS to establish correspon- dence across routes. The dataset includes road and object annotations using amodal masks to capture partial occlu- sions and 3D bounding boxes. We demonstrate the unique- ness of this dataset by analyzing the performance of base- lines in amodal segmentation of road and objects, depth estimation, and 3D object detection. The repeated routes opens new research directions in object discovery, contin- ual learning, and anomaly detection. Link to Ithaca365: https://ithaca365.mae.cornell.edu/
more » « less
Full Text Available

« Prev Next »

Search for: All records